227 research outputs found
Integrating Semantic Knowledge to Tackle Zero-shot Text Classification
Insufficient or even unavailable training data of emerging classes is a big
challenge of many classification tasks, including text classification.
Recognising text documents of classes that have never been seen in the learning
stage, so-called zero-shot text classification, is therefore difficult and only
limited previous works tackled this problem. In this paper, we propose a
two-phase framework together with data augmentation and feature augmentation to
solve this problem. Four kinds of semantic knowledge (word embeddings, class
descriptions, class hierarchy, and a general knowledge graph) are incorporated
into the proposed framework to deal with instances of unseen classes
effectively. Experimental results show that each and the combination of the two
phases achieve the best overall accuracy compared with baselines and recent
approaches in classifying real-world texts under the zero-shot scenario.Comment: Accepted NAACL-HLT 201
Vector-based Efficient Data Hiding in Encrypted Images via Multi-MSB Replacement
As an essential technique for data privacy protection, reversible data hiding
in encrypted images (RDHEI) methods have drawn intensive research interest in
recent years. In response to the increasing demand for protecting data privacy,
novel methods that perform RDHEI are continually being developed. We propose
two effective multi-MSB (most significant bit) replacement-based approaches
that yield comparably high data embedding capacity, improve overall processing
speed, and enhance reconstructed images' quality. Our first method, Efficient
Multi-MSB Replacement-RDHEI (EMR-RDHEI), obtains higher data embedding rates
(DERs, also known as payloads) and better visual quality in reconstructed
images when compared with many other state-of-the-art methods. Our second
method, Lossless Multi-MSB Replacement-RDHEI (LMR-RDHEI), can losslessly
recover original images after an information embedding process is performed. To
verify the accuracy of our methods, we compared them with other recent RDHEI
techniques and performed extensive experiments using the widely accepted BOWS-2
dataset. Our experimental results showed that the DER of our EMR-RDHEI method
ranged from 1.2087 bit per pixel (bpp) to 6.2682 bpp with an average of 3.2457
bpp. For the LMR-RDHEI method, the average DER was 2.5325 bpp, with a range
between 0.2129 bpp and 6.0168 bpp. Our results demonstrate that these methods
outperform many other state-of-the-art RDHEI algorithms. Additionally, the
multi-MSB replacement-based approach provides a clean design and efficient
vectorized implementation.Comment: 14 pages; journa
Self-supervised Registration and Segmentation of the Ossicles with A Single Ground Truth Label
AI-assisted surgeries have drawn the attention of the medical image research
community due to their real-world impact on improving surgery success rates.
For image-guided surgeries, such as Cochlear Implants (CIs), accurate object
segmentation can provide useful information for surgeons before an operation.
Recently published image segmentation methods that leverage machine learning
usually rely on a large number of manually predefined ground truth labels.
However, it is a laborious and time-consuming task to prepare the dataset. This
paper presents a novel technique using a self-supervised 3D-UNet that produces
a dense deformation field between an atlas and a target image that can be used
for atlas-based segmentation of the ossicles. Our results show that our method
outperforms traditional image segmentation methods and generates a more
accurate boundary around the ossicles based on Dice similarity coefficient and
point-to-point error comparison. The mean Dice coefficient is improved by 8.51%
with our proposed method.Comment: conferenc
OmiEmbed: a unified multi-task deep learning framework for multi-omics data
High-dimensional omics data contains intrinsic biomedical information that is
crucial for personalised medicine. Nevertheless, it is challenging to capture
them from the genome-wide data due to the large number of molecular features
and small number of available samples, which is also called 'the curse of
dimensionality' in machine learning. To tackle this problem and pave the way
for machine learning aided precision medicine, we proposed a unified multi-task
deep learning framework named OmiEmbed to capture biomedical information from
high-dimensional omics data with the deep embedding and downstream task
modules. The deep embedding module learnt an omics embedding that mapped
multiple omics data types into a latent space with lower dimensionality. Based
on the new representation of multi-omics data, different downstream task
modules were trained simultaneously and efficiently with the multi-task
strategy to predict the comprehensive phenotype profile of each sample.
OmiEmbed support multiple tasks for omics data including dimensionality
reduction, tumour type classification, multi-omics integration, demographic and
clinical feature reconstruction, and survival prediction. The framework
outperformed other methods on all three types of downstream tasks and achieved
better performance with the multi-task strategy comparing to training them
individually. OmiEmbed is a powerful and unified framework that can be widely
adapted to various application of high-dimensional omics data and has a great
potential to facilitate more accurate and personalised clinical decision
making.Comment: 14 pages, 8 figures, 7 table
Internet Platform Enterprises and Farmers Digital Literacy Improvement
Agricultural and rural informatization is the strategic commanding heights of agricultural and rural modernization, and the improvement of farmers' digital literacy is the core task of digital rural construction. In the digital age, the development of the platform economy has given Internet platform enterprises a new connotation of social responsibility to improve farmers' digital literacy. By strengthening platform governance to enhance farmers' willingness to use digital, promoting science and technology for good to enhance farmers' digital use skills, and innovating digital services to enhance farmers' digital use effects, Internet platform enterprises ultimately achieve the purpose of improving farmers' digital literacy
Integrated Multi-omics Analysis Using Variational Autoencoders: Application to Pan-cancer Classification
Different aspects of a clinical sample can be revealed by multiple types of
omics data. Integrated analysis of multi-omics data provides a comprehensive
view of patients, which has the potential to facilitate more accurate clinical
decision making. However, omics data are normally high dimensional with large
number of molecular features and relatively small number of available samples
with clinical labels. The "dimensionality curse" makes it challenging to train
a machine learning model using high dimensional omics data like DNA methylation
and gene expression profiles. Here we propose an end-to-end deep learning model
called OmiVAE to extract low dimensional features and classify samples from
multi-omics data. OmiVAE combines the basic structure of variational
autoencoders with a classification network to achieve task-oriented feature
extraction and multi-class classification. The training procedure of OmiVAE is
comprised of an unsupervised phase without the classifier and a supervised
phase with the classifier. During the unsupervised phase, a hierarchical
cluster structure of samples can be automatically formed without the need for
labels. And in the supervised phase, OmiVAE achieved an average classification
accuracy of 97.49% after 10-fold cross-validation among 33 tumour types and
normal samples, which shows better performance than other existing methods. The
OmiVAE model learned from multi-omics data outperformed that using only one
type of omics data, which indicates that the complementary information from
different omics datatypes provides useful insights for biomedical tasks like
cancer classification.Comment: 7 pages, 4 figure
Unsupervised Annotation of Phenotypic Abnormalities via Semantic Latent Representations on Electronic Health Records
The extraction of phenotype information which is naturally contained in
electronic health records (EHRs) has been found to be useful in various
clinical informatics applications such as disease diagnosis. However, due to
imprecise descriptions, lack of gold standards and the demand for efficiency,
annotating phenotypic abnormalities on millions of EHR narratives is still
challenging. In this work, we propose a novel unsupervised deep learning
framework to annotate the phenotypic abnormalities from EHRs via semantic
latent representations. The proposed framework takes the advantage of Human
Phenotype Ontology (HPO), which is a knowledge base of phenotypic
abnormalities, to standardize the annotation results. Experiments have been
conducted on 52,722 EHRs from MIMIC-III dataset. Quantitative and qualitative
analysis have shown the proposed framework achieves state-of-the-art annotation
performance and computational efficiency compared with other methods.Comment: Accepted by BIBM 2019 (Regular
- …